This is an interactive notebook. You can run it locally or use the links below:

How to use Weave with Audio Data: An OpenAI Example

This demo uses the OpenAI chat completions API with GPT 4o Audio Preview to generate audio responses to text prompts and track these in Weave.

For the advanced use case, we leverage the OpenAI Realtime API to stream audio in realtime. Click the following thumbnail to view the video demonstration, or click here.

Setup

Start by installing the OpenAI (openai) and Weave (weave) dependencies, as well as API key management dependencey set-env.

%%capture
!pip install openai
!pip install weave
!pip install set-env-colab-kaggle-dotenv -q # for env var
python
%%capture
# Temporary workaround to fix bug in openai:
# TypeError: Client.__init__() got an unexpected keyword argument 'proxies'
# See https://community.openai.com/t/error-with-openai-1-56-0-client-init-got-an-unexpected-keyword-argument-proxies/1040332/15
!pip install "httpx<0.28"

Next, load the required API keys for OpenAI and Weave. Here, we use set_env which is compatible with google colab’s secret keys manager, and is an alternative to colab’s specific google.colab.userdata. See: here for usage instructions.

# Set environment variables.
from set_env import set_env

_ = set_env("OPENAI_API_KEY")
_ = set_env("WANDB_API_KEY")

And finally import the required libraries.

import base64
import os
import time
import wave

import numpy as np
from IPython.display import display
from openai import OpenAI

import weave

Audio Streaming and Storage Example

Now we will setup a call to OpenAI’s completions endpoint with audio modality enabled. First create the OpenAI client and initiate a Weave project.

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
weave.init("openai-audio-chat")

Now we will define our OpenAI completions request and add our Weave decorator (op). Here, we define the function prompt_endpont_and_log_trace. This function has three primary steps:

We make a completion object using the GPT 4o Audio Preview model that supports text and audio inputs and outputs.
- We prompt the model to count to 13 slowly with varying accents.
- We set the completion to “stream”.
We open a new output file to which the streamed data is writen chunk by chunk.
We return an open file handler to the audio file so Weave logs the audio data in the trace.

SAMPLE_RATE = 22050

@weave.op()
def prompt_endpoint_and_log_trace(system_prompt=None, user_prompt=None):
    if not system_prompt:
        system_prompt = "You're the fastest counter in the world"
    if not user_prompt:
        user_prompt = "Count to 13 super super slow, enunciate each number with a dramatic flair, changing up accents as you go along. British, French, German, Spanish, etc."
    # Request from the OpenAI API with audio modality
    completion = client.chat.completions.create(
        model="gpt-4o-audio-preview",
        modalities=["text", "audio"],
        audio={"voice": "fable", "format": "pcm16"},
        stream=True,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
    )

    # Open a wave file for writing
    with wave.open("./output.wav", "wb") as wav_file:
        wav_file.setnchannels(1)  # Mono
        wav_file.setsampwidth(2)  # 16-bit
        wav_file.setframerate(SAMPLE_RATE)  # Sample rate (adjust if needed)

        # Write chunks as they are streamed in from the API
        for chunk in completion:
            if (
                hasattr(chunk, "choices")
                and chunk.choices is not None
                and len(chunk.choices) > 0
                and hasattr(chunk.choices[0].delta, "audio")
                and chunk.choices[0].delta.audio.get("data") is not None
            ):
                # Decode the base64 audio data
                audio_data = base64.b64decode(chunk.choices[0].delta.audio.get("data"))

                # Write the current chunk to the wave file
                wav_file.writeframes(audio_data)

    # Return the file to Weave op
    return wave.open("output.wav", "rb")

Testing

Run the following cell. The system and user prompt will be stored in a Weave trace as well as the output audio. After running the cell, click the link next to the ”🍩” emoji to view your trace.

from IPython.display import Audio

# Call the function to write the audio stream
prompt_endpoint_and_log_trace(
    system_prompt="You're the fastest counter in the world",
    user_prompt="Count to 13 super super slow, enunciate each number with a dramatic flair, changing up accents as you go along. British, French, German, Spanish, etc.",
)

# Display the updated audio stream
display(Audio("output.wav", rate=SAMPLE_RATE, autoplay=True))

Advanced Usage: Realtime Audio API with Weave

Documentation Index

​

​How to use Weave with Audio Data: An OpenAI Example

​Setup

​Audio Streaming and Storage Example

​Testing

​Advanced Usage: Realtime Audio API with Weave

How to use Weave with Audio Data: An OpenAI Example

Setup

Audio Streaming and Storage Example

Testing

Advanced Usage: Realtime Audio API with Weave